{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/safevideo/autollm/blob/main/examples/quickstart.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. Preparation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Install latest version of autollm and some required packages for this tutorial:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "!pip install autollm gradio gitpython nbconvert -Uqq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Import required modules:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import required functions, classes\n",
    "from autollm import AutoQueryEngine\n",
    "from autollm.utils.document_reading import read_github_repo_as_documents, read_files_as_documents\n",
    "import os\n",
    "import gradio as gr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Set your OpenAI API key in order to use default gpt-3.5-turbo model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Read Files as Documents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Reading from a Github repo:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Temporary directory created at autollm/temp\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:autollm.utils.document_reading:Found 221 'documents'.\n",
      "INFO:autollm.utils.document_reading:Operations complete, deleting temporary directory autollm/temp..\n"
     ]
    }
   ],
   "source": [
    "git_repo_url = \"https://github.com/ultralytics/ultralytics.git\"\n",
    "relative_folder_path = \"docs\"   # relative path from the repo root to the folder containing documents\n",
    "required_exts = [\".md\"]    # specify the extensions of the documents to be read \n",
    "\n",
    "documents = read_github_repo_as_documents(git_repo_url=git_repo_url, relative_folder_path=\"docs\", required_exts=required_exts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Reading from a local folder:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# required_exts = ...\n",
    "# documents = read_files_as_documents(input_dir=\"your/path/to/documents\", required_exts=required_exts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**: If you want to read specific file types, adjust the `required_exts` parameter. By default, our functions will read all [supported file types](https://github.com/run-llama/llama_index/blob/main/llama_index/readers/file/base.py#L19-L34) from the specified source."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Configuration of AutoQueryEngine"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Basic Usage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- You can completely skip configuration(advanced usage) if you want to use default settings."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 🌟 **pro tip**: autollm defaults to lancedb as the vector store since it is lightweight, scales from development to production and is 100x cheaper than alternatives!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Parsing documents into nodes: 100%|██████████| 221/221 [00:01<00:00, 180.70it/s]\n",
      "Generating embeddings: 100%|██████████| 520/520 [00:42<00:00, 12.11it/s]\n"
     ]
    }
   ],
   "source": [
    "query_engine = AutoQueryEngine.from_parameters(documents=documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### Advanced Usage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- You can configure the AutoQueryEngine to your needs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "system_prompt = \"You are an friendly ai assistant that help users find the most relevant and accurate answers to their questions based on the documents you have access to. When answering the questions, mostly rely on the info in documents.\"\n",
    "\n",
    "query_wrapper_prompt = '''\n",
    "The document information is below.\n",
    "---------------------\n",
    "{context_str}\n",
    "---------------------\n",
    "Using the document information and mostly relying on it,\n",
    "answer the query.\n",
    "Query: {query_str}\n",
    "Answer:\n",
    "'''\n",
    "\n",
    "enable_cost_calculator = True\n",
    "\n",
    "# llm params\n",
    "model = \"gpt-3.5-turbo\"\n",
    "\n",
    "# vector store params\n",
    "vector_store_type = \"LanceDBVectorStore\"\n",
    "# specific params for LanceDBVectorStore\n",
    "uri = \"tmp/lancedb\" \n",
    "table_name = \"vectors\"\n",
    "\n",
    "# service context params\n",
    "chunk_size = 1024\n",
    "\n",
    "# query engine params\n",
    "similarity_top_k = 5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "llm_params = {\"model\": model}\n",
    "vector_store_params = {\"vector_store_type\": vector_store_type, \"uri\": uri, \"table_name\": table_name}\n",
    "service_context_params = {\"chunk_size\": chunk_size}\n",
    "query_engine_params = {\"similarity_top_k\": similarity_top_k}\n",
    "\n",
    "query_engine = AutoQueryEngine.from_parameters(documents=documents, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, enable_cost_calculator=enable_cost_calculator, llm_params=llm_params, vector_store_params=vector_store_params, service_context_params=service_context_params, query_engine_params=query_engine_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Ask Anything to Your Documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LLM Prompt Token Usage: 1415\n",
      "LLM Completion Token Usage: 239\n",
      "LLM Total Token Cost: $0.002601\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"How to perform tiled inference with SAHI and YOLOv8, include code snippets\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "To perform tiled inference with SAHI and YOLOv8, you can use the `get_sliced_prediction` function provided by SAHI. Here is an example code snippet:\n",
      "\n",
      "```python\n",
      "result = get_sliced_prediction(\n",
      "    \"path/to/image.jpg\",\n",
      "    detection_model,\n",
      "    slice_height=256,\n",
      "    slice_width=256,\n",
      "    overlap_height_ratio=0.2,\n",
      "    overlap_width_ratio=0.2\n",
      ")\n",
      "```\n",
      "\n",
      "In this code snippet, you need to replace `\"path/to/image.jpg\"` with the actual path to your image file. The `detection_model` parameter should be the YOLOv8 model you want to use for object detection.\n",
      "\n",
      "The `slice_height` and `slice_width` parameters determine the dimensions of each slice. The `overlap_height_ratio` and `overlap_width_ratio` parameters specify the overlap ratios between adjacent slices.\n",
      "\n",
      "By running this code, SAHI will perform tiled inference on the image using YOLOv8, and the `result` object will contain the predicted bounding boxes and other information.\n",
      "\n",
      "Please note that you may need to install the necessary dependencies and import the required modules before using this code snippet.\n"
     ]
    }
   ],
   "source": [
    "print(response.response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Or play with it in the gradio app 🚀"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running on local URL:  http://127.0.0.1:7860\n",
      "\n",
      "To create a public link, set `share=True` in `launch()`.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><iframe src=\"http://127.0.0.1:7860/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import gradio as gr\n",
    "\n",
    "def greet(query):\n",
    "    return query_engine.query(query).response\n",
    "\n",
    "demo = gr.Interface(fn=greet, inputs=\"text\", outputs=\"text\")\n",
    "    \n",
    "demo.launch()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### If you found this project useful, [give it a ⭐️ on GitHub](https://github.com/safevideo/autollm) to show your support!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "aidocs",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
